Adapting the PULS event extraction framework to analyze Russian text

نویسندگان

  • Lidia Pivovarova
  • Mian Du
  • Roman Yangarber
چکیده

This paper describes a plug-in component to extend the PULS information extraction framework to analyze Russian-language text. PULS is a comprehensive framework for information extraction (IE) that is used for analysis of news in several scenarios from English-language text and is primarily monolingual. Although monolinguality is recognized as a serious limitation, building an IE system for a new language from the bottom up is very labor-intensive. Thus, the objective of the present work is to explore whether the base framework can be extended to cover additional languages with limited effort, and to leverage the preexisting PULS modules as far as possible, in order to accelerate the development process. The component for Russian analysis is described and its performance is evaluated on two news-analysis scenarios: epidemic surveillance and cross-border security. The approach described in the paper can be generalized to a range of heavilyinflected languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating a dictionary of control models for event extraction

A subordination dictionary is important in a number of text processing applications. We present a method for generating such dictionary for Russian verbs using Google Books Ngram data. An intended purpose of the dictionary is an event extraction system for Russian that uses the dictionary to define extraction patterns.

متن کامل

Filtered Ranking for Bootstrapping in Event Extraction

Several researchers have proposed semi-supervised learning methods for adapting event extraction systems to new event types. This paper investigates two kinds of bootstrapping methods used for event extraction: the document-centric and similarity-centric approaches, and proposes a filtered ranking method that combines the advantages of the two. We use a range of extraction tasks to compare the ...

متن کامل

Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Even...

متن کامل

Building Support Tools for Russian-Language Information Extraction

There is currently a paucity of publicly available NLP tools to support analysis of Russian-language text. This especially concerns higher-level applications, such as Information Extraction. We present work on tools for information extraction from text in Russian in the domain of on-line news. On the lower level we employ the AOT toolkit for natural language processing, which provides modules f...

متن کامل

Knowledge-Rich Context Candidate Extraction and Ranking with KnowPipe

This paper presents ongoing Phd thesis work dealing with the extraction of knowledge-rich contexts from text corpora for terminographic purposes. Although notable progress in the field has been made over recent years, there is yet no methodology or integrated workflow that is able to deal with multiple, typologically different languages and different domains, and that can be handled by non-expe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013